Approximate Dynamic Programming and Reinforcement Learning

نویسندگان

  • Lucian Busoniu
  • Bart De Schutter
  • Robert Babuska
چکیده

Dynamic programming (DP) and reinforcement learning (RL) can be used to address problems from a variety of fields, including automatic control, artificial intelligence, operations research, and economy. Many problems in these fields are described by continuous variables, whereas DP and RL can find exact solutions only in the discrete case. Therefore, approximation is essential in practical DP and RL. This chapter provides an in-depth review of the literature on approximate DP and RL in large or continuous-space, infinite-horizon problems. Value iteration, policy iteration, and policy search approaches are presented in turn. Model-based (DP) as well as online and batch model-free (RL) algorithms are discussed. We review theoretical guarantees on the approximate solutions produced by these algorithms. Numerical examples illustrate the behavior of several representative algorithms in practice. Techniques to automatically derive value function approximators are discussed, and a comparison between value iteration, policy iteration, and policy search is provided. The chapter closes with a discussion of open issues and promising research directions in approximate DP and RL. Lucian Buşoniu Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628CDDelft, The Netherlands, e-mail: [email protected] Bart De Schutter Delft Center for Systems and Control & Marine and Transport Technology Department, Delft University of Technology, Mekelweg 2, 2628CD Delft, The Netherlands, e-mail: [email protected] Robert Babuška Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628CDDelft, The Netherlands, e-mail: [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iter...

متن کامل

A New Hybrid Critic-training Method for Approximate Dynamic Programming

A variety of methods for developing quasi-optimal intelligent control systems using reinforcement learning techniques based on adaptive critics have appeared in recent years. This paper reviews the family of approximate dynamic programming techniques based on adaptive critic methods and introduces a new hybrid critic training method.

متن کامل

Optimal Learning and Approximate Dynamic Programming

Approximate dynamic programming (ADP) has emerged as a powerful tool for tackling a diverse collection of stochastic optimization problems. Reflecting the wide diversity of problems, ADP (including research under names such as reinforcement learning, adaptive dynamic programming and neuro-dynamic programming) has become an umbrella for a wide range of algorithmic strategies. Most of these invol...

متن کامل

Noisy K Best-Paths for Approximate Dynamic Programming with Application to Portfolio Optimization

We describe a general method to transform a non-Markovian sequential decision problem into a supervised learning problem using a K-bestpaths algorithm. We consider an application in financial portfolio management where we can train a controller to directly optimize a Sharpe Ratio (or other risk-averse non-additive) utility function. We illustrate the approach by demonstrating experimental resul...

متن کامل

Reinforcement Learning for Matrix Computations: PageRank as an Example

Reinforcement learning has gained wide popularity as a technique for simulation-driven approximate dynamic programming. A less known aspect is that the very reasons that make it effective in dynamic programming can also be leveraged for using it for distributed schemes for certain matrix computations involving non-negative matrices. In this spirit, we propose a reinforcement learning algorithm ...

متن کامل

Model-Based Reinforcement Learning with Continuous States and Actions

Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the tran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010